Analysis of Crime Reports in ChicagoΒΆ
Lindsey ChenaultΒΆ
2 December 2022ΒΆ
IntroductionΒΆ
My project analyzes specific crimes reported in the city of Chicago from 2001 to the present. The crime has precise geolocations, types, and years for each crime, as well as many more columns describing a distinct crime. The dataset analyzed is from the Chicago Police Department's CLEAR system.
This project was significant to me because my family and I are moving to Chicago, so being mindful of certain types of crime is useful, as well as knowing which places to avoid moving or even going alone, for example. I seek to discover where crime is most apparent, the ratio of crimes to arrests, and the relationship between each location and crime type. I am also curious to learn what crime will look like in the next five years based on patterns from the dataset.
Based on what I had observed in the news over the years, I had a hunch that burglary would be the most common crime. There was a big spread of word about crime in Chicago, although, in my experience, it was never specified by word of mouth. I also had a hunch that burglary would be the highest reported within the most recent five years and that within the next five years, burglary will still be the highest crime based on patterns.
from IPython.display import Image, display
display(Image(filename='crime.PNG'))
Data ExplainedΒΆ
- The data used in this analysis can be found at https://catalog.data.gov/dataset/crimes-2001-to-present
- It was discovered through the EDA process that the total sample size before cleaning data was 65,599 reported crimes.
- I discovered that the minimum latitude and longitude values (columns) suggested that few entries were slightly outside the expected range, possibly as geographic outliers; however, because I had the location and description, I knew that would not be necessary. While checking out nulls, I learned that the columns ID, Case Number, Date, Block, IUCR, Primary type, and Description have both an equal amount and the most nulls.
- I concluded that some of the data were unneeded for my goals, so I eliminated 16 columns to focus more on the primary proper columns that describe each crime. The data cleaning process included dropping nulls as well as dropping unnecessary columns.
import pandas as pd
crimes = pd.read_csv('crimes_final.csv')
crimes
| Primary Type | Description | Location Description | Arrest | Year | Location | |
|---|---|---|---|---|---|---|
| 0 | ASSAULT | SIMPLE | OTHER | False | 2007 | NaN |
| 1 | HOMICIDE | FIRST DEGREE MURDER | STREET | True | 2021 | (41.917838056, -87.755968972) |
| 2 | HOMICIDE | FIRST DEGREE MURDER | PARKING LOT | True | 2021 | (41.995219444, -87.713354912) |
| 3 | BURGLARY | UNLAWFUL ENTRY | APARTMENT | False | 2023 | (41.952345086, -87.677975059) |
| 4 | BATTERY | AGGRAVATED P.O. - HANDS, FISTS, FEET, NO / MIN... | SMALL RETAIL STORE | True | 2023 | (41.737750767, -87.604855911) |
| ... | ... | ... | ... | ... | ... | ... |
| 65494 | MOTOR VEHICLE THEFT | AUTOMOBILE | STREET | True | 2023 | (41.817272712, -87.634557976) |
| 65495 | CRIMINAL DAMAGE | TO PROPERTY | APARTMENT | False | 2023 | (41.764899756, -87.615255831) |
| 65496 | BATTERY | SIMPLE | RESIDENCE | False | 2023 | (41.689143038, -87.670065135) |
| 65497 | MOTOR VEHICLE THEFT | AUTOMOBILE | STREET | False | 2023 | (41.894226067, -87.619222865) |
| 65498 | ASSAULT | SIMPLE | SMALL RETAIL STORE | True | 2023 | (41.976814727, -87.659880317) |
65499 rows Γ 6 columns
ResultsΒΆ
#Load modules
import pandas as pd
#Load for visuals
import seaborn as sns
import matplotlib.pyplot as plt
import plotly.express as px
#Setting up notebook to display multiple output in one cell
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = 'all'
#Read in file files
crimes = pd.read_csv('crimes_final.csv')
# Show file
crimes.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 65499 entries, 0 to 65498 Data columns (total 6 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Primary Type 65499 non-null object 1 Description 65499 non-null object 2 Location Description 65084 non-null object 3 Arrest 65499 non-null bool 4 Year 65499 non-null int64 5 Location 64762 non-null object dtypes: bool(1), int64(1), object(4) memory usage: 2.6+ MB
Pie ChartΒΆ
Using object-oriented programming (OOP) style in Matplotlib, this pie chart can be our initial understanding of the proportion of crimes throughout the years.
Pie chart observations:
- This pie chart displays the top crimes since 2001, showing that theft is at the highest at 22.6%, and battery is the second highest at 16.9% of the total.
- I can also make the precise observation that my original hunch of burglary being the most common crime is only 3.1% out of all other crimes, which was a surprise.
- Note: There are two "Others" categories in this chart. "Other" accounts explicitly for crimes less than 1% of the total, so it will not be as important.
crime_counts = crimes['Primary Type'].value_counts()
threshold = 0.04
#Filtering Crime counts to view most common crimes more clearly
filt_counts = crime_counts[crime_counts / crime_counts.sum() > threshold]
filt_counts['Other'] = crime_counts[crime_counts / crime_counts.sum() <= threshold].sum()
# Pie chart
fig, ax = plt.subplots(figsize=(9, 9))
wedges , _, _= ax.pie(filt_counts, autopct='%1.1f%%', startangle=130, colors=sns.color_palette('rainbow', len(filt_counts)))
ax.legend(wedges, filt_counts.index, loc="upper left", bbox_to_anchor=(1, 0.7))
ax.set_title('Percentage of Primary Types')
#Display Pie Chart
plt.show()
<matplotlib.legend.Legend at 0x13503c860>
Text(0.5, 1.0, 'Percentage of Primary Types')
Arrests Vs Non ArrestsΒΆ
My next goal is to compare which crimes had no arrests and which did.
- Battery appears to be the most reported and the top crime where no arrest was made. Narcotics seem to be the most concentrated for arrest, while theft was the most prominent crime where there was no reported arrest.
- Creating a bubble chart as well.
#Creating columns and grouping
arrest_count = crimes.groupby(['Primary Type', 'Arrest']).size().unstack(fill_value=0)
arrest_count.columns = ['No Arrest', 'Arrest']
#Sorting Arrest vs Non-arrest
arrest_count = arrest_count.sort_values('Arrest', ascending=False)
#Display top 5
arrest_count.head()
| No Arrest | Arrest | |
|---|---|---|
| Primary Type | ||
| BATTERY | 9330 | 1764 |
| NARCOTICS | 59 | 1245 |
| WEAPONS VIOLATION | 913 | 1155 |
| THEFT | 14012 | 819 |
| OTHER OFFENSE | 3232 | 697 |
# Filtering for top 10 primary types
top_10 = crimes['Primary Type'].value_counts().head(10).index
top_10_crimes = crimes[crimes['Primary Type'].isin(top_10)].groupby(['Year', 'Primary Type']).size().reset_index(name='Count')
# Create a bubble chart
chart = px.scatter(
top_10_crimes,
x='Year',
y='Primary Type',
size='Count',
color='Primary Type',
title='Top 10 Crime Types: Trends by Arrests'
)
#Display
chart.show(renderer='notebook')
Crime Prevalence by Location Coordinates using a bar chart.ΒΆ
With this information, we now know that the coordinate (41.883500187, -87.627876698) (100-148 N State St Chicago, IL 60602) appears to have the most crime. The second most common location is (41.868541914,-87.639235361) on the corner of S Canal St and Roosevelt Rd. One helpful thing to note is that these locations are 0.4 miles apart, near shopping centers, and not near any housing.
crimes['Location'].value_counts().head(10).plot.barh(title='Top 10 Coordinates with Most Crimes')
plt.show()
<Axes: title={'center': 'Top 10 Coordinates with Most Crimes'}, ylabel='Location'>
Zooming into 2015 to present, so I can have a better visualization of the crimes.ΒΆ
Some crimes, such as battery, assault, and burglary, follow consistent patterns or rise frequently. The tremendous growth in recent years could be due to distinctive independent facets, such as documenting modifications, population increase, or further societal impacts.
#Limiting the years
crime_trends = crimes.groupby(['Year', 'Primary Type']).size().unstack(fill_value=0)
crime_trends = crime_trends[crime_trends.index >= 2015]
#plotting
crime_trends.plot(figsize=(12, 6), colormap='rainbow', linewidth=4)
plt.title("Crime Trends from 2015 to 2024", fontsize=15)
plt.xlabel("Year")
plt.ylabel("Number of Crimes")
plt.legend(title="Crime Type", bbox_to_anchor=(1.01, 1), loc='upper left')
plt.grid(axis='y', alpha=0.7)
plt.show()
<Axes: xlabel='Year'>
Text(0.5, 1.0, 'Crime Trends from 2015 to 2024')
Text(0.5, 0, 'Year')
Text(0, 0.5, 'Number of Crimes')
<matplotlib.legend.Legend at 0x136a629f0>
Bubble ChartΒΆ
I created an interactive bubble chart to display the top crimes associated with the top five location descriptions.
# Filter for top 5 locations
top_5_loc = crimes.groupby('Location Description').size().nlargest(5).index
filter_location = crimes[crimes['Location Description'].isin(top_5_loc)]
# Create bubble chart
fig = px.scatter(
filter_location.groupby(['Location Description', 'Primary Type']).size().reset_index(name='Count'),
x='Location Description',
y='Primary Type',
size='Count',
color='Primary Type',
title='Top 5 Locations: Crime Incidents by Location and Type',
height=700
)
fig.show(renderer='notebook')
SummaryΒΆ
This project has been instrumental in learning which crimes are the most prominent and where exactly they have occurred. I was surprised to learn about these crimes while also relieved that other more intense crimes were lower in percentage. Going through each visualization, we know that crime has dropped in 2024 compared to 2023. Theft is a considerable prevailing crime, while battery, criminal damage, and motor vehicle theft are noteworthy.
Certain crime types, such as Theft and Battery, depict consistent activity over the years. A recent wave in some crimes (like Motor Vehicle Theft) implies a potential shift in criminal behavior or reporting practices. Addressing my curiosity about the arrest ratio, we know that Theft has a significantly lower arrest-to-incident ratio, implying challenges in fixing those cases. Battery and Narcotics show higher arrest ratios, which reflects a more enforcement concentration on violent crimes and drug-related offenses. Crimes like Weapons Violations also show a high arrest-to-incident ratio, suggesting targeted policing in these areas. It was interesting to learn that the locations for the highest crimes were near shopping centers, but it was also exemplary to be aware of where not to move.
Future steps can be taken by continuously tracking trends over time for the top 5 and top 10 crime categories to anticipate changes in criminal behavior. It would also be interesting to incorporate location-specific demographic and economic data to identify root causes of crime in hotspot areas. The following steps can also be more coordinates analysis, using this data to predict the next 10 years, based on patterns. The only difficulty I had with the project was creating visualizations that would highlight the main points I wanted to discover. I knew I could not accurately display my specific data with heatmaps or with the visualizations in previous instructions. However, the bubble and pie charts were the perfect visualizations for my data.